---
title: Introduction to Buffer Overflow
show-content: 1
layout: console
---

# Introduction

In computer security and programming, a **bufer overflow** or **buffer
overrun**, is an anomaly where a program, while writing data to a buffer,
overruns the buffer's boudnary and overwrites adjacent memory. This is a special
case of violation of memory safety.

Buffer overflows can be triggered by inputs that are designed to execute code,
or alter the way the program operates. This may result in erratic program
behaviour, including memory access errors, incorrect results, a crash or a
breach of system security. They ar, thus the basis of many software
vulnerabilities and me maliciously exploited.

Programming languages commonly associated with buffer overflows include _C_ and
_C++_, which provide no built-in protection against accessing or overwriting
data in any part of memory and do not automatically check that data written to
an array (the built-in buffer type) is within the boundaries of that array.
Bounds checking can prevent buffer overflows.

You probably have a lot of questions:

**What is the reason of this crazy bug?**

It's an error of programming.

**Is it important to learn or is it just a small and stupid exploit?**

This type of exploit is what makes the difference between professional and
normal hackers. I will explain it later.

**Can you give me a technical description?**

A buffer overflow occurs when data written to a buffer, due to insufficient
bounds checking, corrupts data values in memory, addresses adjacent to the
allocated buffer. Most commonly this occurs when copying strings from one buffer
to another.

## Basic example

In the following example, a program has defined two data items which are
adjacent in memory: an 8-byte-long string buffer (**A**) and a two-byte integer
(**B**). Initially, _A_ contains nothing but zero bytes, and **B** contains the
number **1979**. Characters are one byte wide:

| Variable name | A | B
| Value         | [null string] | 1979
| Hex value     | 00 00 00 00 00 00 00 00 | 07 BB

Now, the program attempts to store the null-terminated string "excessive" in the
A buffer. By failing to check the length of the string, ut overwrites the value
of B:

| Variable name | A | B
| Value | 'e' 'x' 'c' 'e' 's' 's' 'i' 'v' | 25856
| Hex | 65 78 63 65 73 73 69 76 | 65 00

Although the programmer did not intend to change B at all, B's value has now
been replaced by a number formed from part of the character string. In this
example, on a big-endian system that uses ASCII, "e" followed by a zero byte
would become the number 25856. If B was the only other variable data item
defined by the program, writing an even longer string that went past the end of
B could cause an error such as a segmentation fault, terminating the process.

## Buffer overflow exploits

Let's talk about it.

A buffer overflow problem is based in the memory where the program store its
data.

**Why is that**

What buffer overflow do, is overwrite especific memory places where should be
something you want that will make the program do something you want.

Let's follow a program and try to find and fix the buffer overflow:

# Discovering and attacking buffer overflows

The thing you should know is that everyone knows how to use them. Just go to
sites like security focus, Exploit DB or fyodor's exploit world or Injector,
download it, run it and then get busted. But, why doesn't everybody write
exploits and shell codes? Well, the problem is that many people doesn't know how
to spot some vulnerability in the source code or even if they can, they are not
able to write an exploit.

Let's take a look at the following code:

```c
int main(int argc, char **argv) {
        char *somevar;
        char *important;
        somevar = (char *)malloc(sizeof(char)*4);
        important = (char *)malloc(sizeof(char)*14);
        strcpy(important, "command"); // This one is the important variable
        strcpy(important, argv[1]);
}
```

So, let's say that the variable "_important_" stores some system command like,
let's say "_chmod o-r file" (for example) and since that file is owned by the
root, the program is run under root user too, this means that if you can send
commands to it, you can execute ANY system command (_mkdir_, _ls -la_, _cd
..._). You will play with the server like a doll, so you can start thinking
"_how the hell can I put something that I want in the important variable?_".
Well, the way is to overflow the memory so we can reach it. But let's see
variables memory addresses.

To do that, you need to re-write the code. Check the following one:

```c
int main(int argv, char **argv) {
        char *somevar;
        char *important;
        somevar = (char *) malloc(sizeof(char) * 4);
        important = (char *) malloc(sizeof(char) * 14);
        printf("%p\n%p\n", somevar, important);
        exit(0);
}
```

Well, I just added 2 new lines in the source code and left the rest unchanged.
Let's see what does these two lines do:

- The `prinf("%p\n%p\n", somevar, important)` line will print the memory
  addresses for _somevar_ and _important_ variables.

- The `exit(0)` will keep the rest of the program running, after all, you don't
  want it for nothing, your goal was to know where the variables are stored.

After running the program , you would get an output like the following: (You
will probably not get the same memory addresses)

```
0x5556d165b2a0 <---- This is the address of somevar
0x5556d165b2c0 <---- This is the address of important
```

As we can see, the _important_ variable is next to _somevar_, this will let us
use our buffer overflow skills, since somevar is got from from "_argv[1]_". Now,
we know that one follow the other, but let's check each memory address so we can
have the precise notion of the data storage. To do this, let's rewrite the code
again:

```c
int main(int argc, char **argv) {
        char *somevar;
        char *important;
        char *temp; /* We'll need another variable */

        somevar = (char *) malloc(sizeof(char) * 4);
        important = (char *) malloc(sizeof(char) * 14);
        strcpy(important, "command");
        strcpy(somevar, argv[1]); /* This one is the important variable*/
        printf("%p\n%p\n", somevar, important);
        printf("Starting to print memory addresses:\n");

        temp = somevar; // This will put temp at the first memory addres we want

        while (temp < important + 14) {
                /**
                 * This loop will be broken when we get to the last memory
                 * address we want, last memory address of important variable
                 */
                printf("%p: %c (0x%x)\n", temp, *temp, *(unsigned int *)temp);
                temp++;
        }

        exit(0);
}
```
Now let's say that the argv[1] should be in normal use send. So you just type in
your prompt:

```bash
gcc overflow.c -o overflow
./overflow send
```

You'll get an output like:

```
0x55c8cf4c82a0
0x55c8cf4c82c0
Starting to print memory addresses:
0x55c8cf4c82a0: c (0x6d6d6f63)
0x55c8cf4c82a1: o (0x616d6d6f)
0x55c8cf4c82a2: m (0x6e616d6d)
0x55c8cf4c82a3: m (0x646e616d)
0x55c8cf4c82a4: a (0x646e61)
0x55c8cf4c82a5: n (0x646e)
0x55c8cf4c82a6: d (0x64)
0x55c8cf4c82a7:  (0x0)
0x55c8cf4c82a8:  (0x0)
0x55c8cf4c82a9:  (0x0)
0x55c8cf4c82aa:  (0x0)
0x55c8cf4c82ab:  (0x0)
0x55c8cf4c82ac:  (0x0)
0x55c8cf4c82ad:  (0x0)
0x55c8cf4c82ae:  (0x0)
0x55c8cf4c82af:  (0x0)
0x55c8cf4c82b0:  (0x0)
0x55c8cf4c82b1:  (0x0)
0x55c8cf4c82b2:  (0x0)
0x55c8cf4c82b3:  (0x0)
0x55c8cf4c82b4:  (0x0)
0x55c8cf4c82b5:  (0x21000000)
0x55c8cf4c82b6:  (0x210000)
0x55c8cf4c82b7:  (0x2100)
0x55c8cf4c82b8: ! (0x21)
0x55c8cf4c82b9:  (0x0)
0x55c8cf4c82ba:  (0x0)
0x55c8cf4c82bb:  (0x0)
0x55c8cf4c82bc:  (0x0)
0x55c8cf4c82bd:  (0x73000000)
0x55c8cf4c82be:  (0x65730000)
0x55c8cf4c82bf:  (0x6e657300)
0x55c8cf4c82c0: s (0x646e6573) <-- This line represents a memory address
0x55c8cf4c82c1: e (0x646e65) <-- This line represents a memory address
0x55c8cf4c82c2: n (0x646e) <-- This line represents a memory address
0x55c8cf4c82c3: d (0x64) <-- This line represents a memory address
0x55c8cf4c82c4:  (0x0)
0x55c8cf4c82c5:  (0x0)
0x55c8cf4c82c6:  (0x0)
0x55c8cf4c82c7:  (0x0)
0x55c8cf4c82c8:  (0x0)
0x55c8cf4c82c9:  (0x0)
0x55c8cf4c82ca:  (0x0)
0x55c8cf4c82cb:  (0x0)
0x55c8cf4c82cc:  (0x0)
0x55c8cf4c82cd:  (0x0)
```

Nice, isn't it? You can now see that there exist 27 memory addresses empty
between somevar and important. So, let's say that you run the program with a
command line like:

```
./overflow send---------------------------newcommand
```

You'll get an output like:

```
0x563d882382a0
0x563d882382c0
Starting to print memory addresses:
0x563d882382a0: s (0x646e6573) <-- important variable
0x563d882382a1: e (0x2d646e65) <-- important variable
0x563d882382a2: n (0x2d2d646e) <-- important variable
0x563d882382a3: d (0x2d2d2d64) <-- important variable
0x563d882382a4: - (0x2d2d2d2d) <-- important variable
0x563d882382a5: - (0x2d2d2d2d) <-- important variable
0x563d882382a6: - (0x2d2d2d2d) <-- important variable
0x563d882382a7: - (0x2d2d2d2d) <-- important variable
0x563d882382a8: - (0x2d2d2d2d) <-- important variable
0x563d882382a9: - (0x2d2d2d2d) <-- important variable
0x563d882382aa: - (0x2d2d2d2d) <-- important variable
0x563d882382ab: - (0x2d2d2d2d) <-- important variable
0x563d882382ac: - (0x2d2d2d2d) <-- important variable
0x563d882382ad: - (0x2d2d2d2d) <-- important variable
0x563d882382ae: - (0x2d2d2d2d) <-- important variable
0x563d882382af: - (0x2d2d2d2d) <-- important variable
0x563d882382b0: - (0x2d2d2d2d) <-- important variable
0x563d882382b1: - (0x2d2d2d2d) <-- important variable
0x563d882382b2: - (0x2d2d2d2d) <-- important variable
0x563d882382b3: - (0x2d2d2d2d) <-- important variable
0x563d882382b4: - (0x2d2d2d2d) <-- important variable
0x563d882382b5: - (0x2d2d2d2d) <-- important variable
0x563d882382b6: - (0x2d2d2d2d) <-- important variable
0x563d882382b7: - (0x2d2d2d2d) <-- important variable
0x563d882382b8: - (0x2d2d2d2d) <-- important variable
0x563d882382b9: - (0x2d2d2d2d) <-- important variable
0x563d882382ba: - (0x2d2d2d2d) <-- important variable
0x563d882382bb: - (0x2d2d2d2d) <-- important variable
0x563d882382bc: - (0x6e2d2d2d) <-- important variable
0x563d882382bd: - (0x656e2d2d) <-- important variable
0x563d882382be: - (0x77656e2d) <-- important variable
0x563d882382bf: n (0x6377656e) <-- important variable
0x563d882382c0: e (0x6f637765) <-- important variable
0x563d882382c1: w (0x6d6f6377) <-- important variable
0x563d882382c2: c (0x6d6d6f63) <-- important variable
0x563d882382c3: o (0x616d6d6f) <-- important variable
0x563d882382c4: m (0x6e616d6d) <-- important variable
0x563d882382c5: m (0x646e616d) <-- important variable
0x563d882382c6: a (0x646e61) <-- important variable
0x563d882382c7: n (0x646e) <-- important variable
0x563d882382c8: d (0x64) <-- important variable
0x563d882382c9:  (0x0)
0x563d882382ca:  (0x0)
0x563d882382cb:  (0x0)
0x563d882382cc:  (0x0)
0x563d882382cd:  (0x0)
```

New command got over command. Now it does something you want, instead of
something it was supposed to do.

**NOTE**: Remember, sometimes those spaces between somevar and important can
have other variables instead of being empty, so check their values and send them
to the same address or the program can crash before getting to the variable
that you modified.

Now let's think a little.

**Why does this happen?**

As you can see in the source code, somevar is declared before important, this
will make, most of times, that somevar will be first in memory. Now, let's check
how each one is got.

Somevar gets its value from _argv[1]_ and important gets it from the _strcpy()_
function, but the real problem is that important value is assigned first, so
when you assign the value to somevar, that is before "_important_" can be
overwritten. This program could be patched against this buffer overflow,
switching those two lines, becoming:

```c
strcpy(somevar, argv[1]);
strcpy(important, "command");
```

If this was the way that the program was done, even if you give an argument that
would get into the memory address of important, it will be overwritten by the
true command, since after getting somevar is assigned the value command to
important.

This kind of buffer overflow, is a heap buffer overflow. Like you probably has
seen, they are really easy todo, in theory, but in the real world, it's not easy
to do them, after all, the example I gave was a really dumb program, right? It's
a real pain to find those important variables and also to overflow that
variable, you need to be able to write the one that is in a lower memory
address.

The Buffer Overflow is like a sea, if you are really interested and you want to
learn everything about, you can check the entry on [Wikipedia](https://en.wikipedia.org/wiki/Buffer_overflow).
